Reconsidering the significance of genomic word frequencies.
نویسندگان
چکیده
By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law features found across all known genomes. Such a distribution could be the result of completely random evolution by a copying process. Our characterization of the entire frequency distribution of genomic words opens a way to a more accurate reasoning about their over- and underrepresentation in genomic sequences.
منابع مشابه
Title: Reconsidering the Significance of Genomic Word Frequencies 1 2 Short Title: Genomic Word Frequencies 3 4 Introduction
NOTICE: this is the authors' version of a work that was accepted for publication in Trends in Genetics. Changes resulting from the publishing process such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Abstract 1 By conventiona...
متن کاملReconsidering Imamiyeh theologian`s proofs for infallibility of Imam until the end of fifth century
This article has no abstract.
متن کاملThe Significance of Education and Gender in Persian Word-selection
This study strives to investigate the importance of ‘education’ and ‘gender’, as two major sociolinguistic variables, in accepting or rejecting the words coined by the Iranian Academy of Persian Language and Literature (APLL). A total of 500 students from state universities in Tehran were chosen as subjects and provided with a questionnaire consisting of 50 APLL equivalents. The respondents’ ac...
متن کاملLong non-coding RNAs and their significance in human diseases
Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...
متن کاملAssociation of PIT1 gene and milk protein percentage in Holstein cattle
The pituitary-specific transcription factor (PIT-1) gene is a candidate gene for growth, carcass and also for milk yield traits. In dairy farm animals, the main goal of the selection is the improvement of milk yield and composition. The genes of milk proteins and hormones are excellent candidate genes for linkage analysis with quantitative trait loci (QTL) because of their biological significan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Trends in genetics : TIG
دوره 23 11 شماره
صفحات -
تاریخ انتشار 2007